An Ensemble Based Top Performing Approach for NCI-DREAM Drug Sensitivity Prediction Challenge
نویسندگان
چکیده
We consider the problem of predicting sensitivity of cancer cell lines to new drugs based on supervised learning on genomic profiles. The genetic and epigenetic characterization of a cell line provides observations on various aspects of regulation including DNA copy number variations, gene expression, DNA methylation and protein abundance. To extract relevant information from the various data types, we applied a random forest based approach to generate sensitivity predictions from each type of data and combined the predictions in a linear regression model to generate the final drug sensitivity prediction. Our approach when applied to the NCI-DREAM drug sensitivity prediction challenge was a top performer among 47 teams and produced high accuracy predictions. Our results show that the incorporation of multiple genomic characterizations lowered the mean and variance of the estimated bootstrap prediction error. We also applied our approach to the Cancer Cell Line Encyclopedia database for sensitivity prediction and the ability to extract the top targets of an anti-cancer drug. The results illustrate the effectiveness of our approach in predicting drug sensitivity from heterogeneous genomic datasets.
منابع مشابه
Improving Drug Sensitivity Prediction Using Different Types of Data
The algorithms and models used to address the two subchallenges that are part of the NCI-DREAM (Dialogue for Reverse Engineering Assessments and Methods) Drug Sensitivity Prediction Challenge (2012) are presented. In subchallenge 1, a bidirectional search algorithm is introduced and optimized using an ensemble scheme and a nonlinear support vector machine (SVM) is then applied to predict the ef...
متن کاملIntegratedMRF: random forest-based framework for integrating prediction from different data types
Summary IntegratedMRF is an open-source R implementation for integrating drug response predictions from various genomic characterizations using univariate or multivariate random forests that includes various options for error estimation techniques. The integrated framework was developed following superior performance of random forest based methods in NCI-DREAM drug sensitivity prediction challe...
متن کاملAn ensemble-based Cox proportional hazards regression framework for predicting survival in metastatic castration-resistant prostate cancer (mCRPC) patients
From March through August 2015, nearly 60 teams from around the world participated in the Prostate Cancer Dream Challenge (PCDC). Participating teams were faced with the task of developing prediction models for patient survival and treatment discontinuation using baseline clinical variables collected on metastatic castrate-resistant prostate cancer (mCRPC) patients in the comparator arm of four...
متن کاملSparse group factor analysis for biclustering of multiple data sources
MOTIVATION Modelling methods that find structure in data are necessary with the current large volumes of genomic data, and there have been various efforts to find subsets of genes exhibiting consistent patterns over subsets of treatments. These biclustering techniques have focused on one data source, often gene expression data. We present a Bayesian approach for joint biclustering of multiple d...
متن کاملSurvival Prediction with Limited Features: a Top Performing Approach from the DREAM ALS Stratification Prize4Life Challenge
Survival prediction with small sets of features is a highly relevant topic for decisionmaking in clinical practice. I describe a method for predicting survival of amyotrophic lateral sclerosis (ALS) patients that was developed as a submission to the DREAM ALS Stratification Prize4Life Challenge held in summer 2015 to find the most accurate prediction of ALS progression and survival. ALS is a ne...
متن کامل